[`Attn Masks`] Non-vmap default for attention masks #41852

vasqu · 2025-10-24T19:51:29Z

Non-vmap creation of masks. These work with all our base masks and we only default back to vmap when using patterns we cannot guarantee (i.e. additional and/or masks).

Note:

Non-vmap works with every mask that has anything index based
Merged old/new sdpa under one function --> easier maintenance imo
Executorch does not need an additional masking fn anymore
Lifts some restrictions on older torch versions, e.g. chunked attn with padding, packed attn masks etc

Fixes #41639

cc @jiqing-feng @IlyasMoutawwakil

HuggingFaceDocBuilderDev · 2025-10-24T20:00:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

… properly

…nteed masks

vasqu · 2025-10-29T11:08:53Z

src/transformers/integrations/executorch.py

    return cache
-
-
-def sdpa_mask_without_vmap(


No longer needed as vmap was the reason we needed this workaround in the first place

vasqu · 2025-10-29T11:09:11Z

src/transformers/masking_utils.py

+    NOTE: It is important to keep an index-based version for non-vmap expansion.
    """
-    return q_idx.new_ones((), dtype=torch.bool)
+    return q_idx >= 0


As noted above, for non-vmap we need this as index based version

src/transformers/masking_utils.py

vasqu · 2025-10-29T11:12:11Z

src/transformers/masking_utils.py

-        causal_mask |= torch.all(~causal_mask, dim=-1, keepdim=True)
-    return causal_mask
-
+        attention_mask = attention_mask | torch.all(~attention_mask, dim=-1, keepdim=True)


I encountered issues with the inplace version where we'd need a clone (e.g. when using swa). This is safer

vasqu added 2 commits October 24, 2025 20:39

atmpt 1

dfcb545

fixup masking to work correctly with old torch

b87a139

vasqu and others added 6 commits October 24, 2025 22:00

Merge branch 'main' into non-vmap-masks

9aed30d

few changes to make things a bit more cleaner

969dab5

oopsie

513c8ef

fix integer overflow on bidirectional masks via indexing fn

466acab

rm executorch workarounds --> still need to handle on sliding etc fns…

bbaf41d

… properly

typo

65357d9

vasqu mentioned this pull request Oct 27, 2025

T5 migration to new masking interface #41804

Open

4 tasks

vasqu added 3 commits October 28, 2025 18:12

docs, fix older torch inplace issue, proper kwarg handling

aaaaec2

chunked works with non vmap and older torch, add warning on non guara…

539bafa

…nteed masks

lift unnecessary restriction on older torch

01848e3

vasqu changed the title ~~[WIP][Masking] Non-vmap default for attention masks~~ [Attn Masks] Non-vmap default for attention masks Oct 29, 2025

Merge branch 'main' into non-vmap-masks

9dc6296

vasqu marked this pull request as ready for review October 29, 2025 11:06

vasqu requested review from ArthurZucker and Cyrilvallez October 29, 2025 11:06

vasqu commented Oct 29, 2025

View reviewed changes

src/transformers/masking_utils.py Outdated Show resolved Hide resolved

vasqu commented Oct 29, 2025

View reviewed changes

vasqu mentioned this pull request Oct 30, 2025

Fix executorch export with dynamic shapes #41559

Draft

5 tasks

vasqu added 3 commits October 31, 2025 11:16

simplify a few things, restrict torch < 2.6 to non-vmap (for now)

17c7a48

try fix

4e6e799

remove unnecessary slicing logic

26b266c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`Attn Masks`] Non-vmap default for attention masks #41852

[`Attn Masks`] Non-vmap default for attention masks #41852

vasqu commented Oct 24, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 24, 2025

Uh oh!

vasqu Oct 29, 2025

Uh oh!

vasqu Oct 29, 2025

Uh oh!

Uh oh!

vasqu Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Attn Masks] Non-vmap default for attention masks #41852

Are you sure you want to change the base?

[Attn Masks] Non-vmap default for attention masks #41852

Conversation

vasqu commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 24, 2025

Uh oh!

vasqu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[`Attn Masks`] Non-vmap default for attention masks #41852

[`Attn Masks`] Non-vmap default for attention masks #41852

vasqu commented Oct 24, 2025 •

edited

Loading